Quickly generating billion-record synthetic databases

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Random Regular Graphs Quickly

There are various algorithms known for generating graphs with n vertices of given degrees uniformly at random. Unfortunately, none of them is of practical use for all degree sequences, even for those with all degrees equal. In this paper we examine an algorithm which, although it does not generate uniformly at random, is provably close to a uniform generator when the degrees are relatively smal...

متن کامل

Generating meaningful test databases

Testing is one of the most time-consuming and cost-intensive tasks in software development projects today. A recent report of the NIST [RTI02] estimated the costs for the economy of the Unites States of America caused by software errors in the year 2000 to range from $22.2 to $59.5 billion. Consequently, in the past few years, many techniques and tools have been developed to reduce the high tes...

متن کامل

Quickly Generating Representative Samples from an RBM-Derived Process

Two learning algorithms were recently proposed – Herding and Fast Persistent Contrastive Divergence (FPCD) – which share the following interesting characteristic: they exploit changes in the model parameters while sampling in order to escape modes and mix better, during the sampling process that is part of the learning algorithm. We justify such approaches as ways to escape modes while approxim...

متن کامل

Record Linkage for Genealogical Databases

In this paper we describe past experience and outline current directions in performing record linkage over large genealogical databases. 1. INTRODUCTION AND MOTIVATION Record linkage is the problem of identifying multiple records that refer to the same real-world entity. In genealogical databases, it is the problem of identifying when individuals situated in different pedigrees refer to the sam...

متن کامل

Generating Databases for Query Workloads

To evaluate the performance of database applications and DBMSs, we usually execute workloads of queries on generated databases of different sizes and measure the response time. This paper introduces MyBenchmark, an offline data generation tool that takes a set of queries as input and generates database instances for which the users can control the characteristics of the resulting workload. Appl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM SIGMOD Record

سال: 1994

ISSN: 0163-5808

DOI: 10.1145/191843.191886